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Abstract In recent years, study of influence propaga- 
tion in social networks has gained tremendous atten- 
tion. In this context, we can identify three orthogonal 
dimensions - the number of seed nodes activated at the 
beginning (known as budget), the expected number of 
activated nodes at the end of the propagation (known 
as expected spread or coverage), and the time taken for 
the propagation. We can constrain one or two of these 
and try to optimize the third. In their seminal paper, 
Kcmpe, Kleinberg and Tardos constrained the budget, 
left time unconstrained, and maximized the coverage: 
this problem is known as Influence Maximization (or 
MAXINF for short). 

In this paper, we study alternative optimization 
problems which are naturally motivated by resource 
and time constraints on viral marketing campaigns. In 
the first problem, termed Minimum Target Set Selec- 
tion (or MINTSS for short), a coverage threshold 77 is 
given and the task is to find the minimum size seed set 
such that by activating it, at least 77 nodes are even- 
tually activated in the expected sense. This naturally 
captures the problem of deploying a viral campaign on a 
budget. In the second problem, termed MINTIME, the 
goal is to minimize the time in which a predefined cover- 
age is achieved. More precisely, in MINTIME, a cover- 
age threshold n and a budget threshold k are given, and 
the task is to find a seed set of size at most k such that 
by activating it, at least 77 nodes are activated in the ex- 
pected sense, in the minimum possible time. This prob- 
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lem addresses the issue of timing when deploying viral 
campaigns. Both these problems are NP-hard, which 
motivates our interest in their approximation. 

For MINTSS, we develop a simple greedy algorithm 
and show that it provides a bicriteria approximation. 
We also establish a generic hardness result suggesting 
that improving this bicriteria approximation is likely to 
be hard. For MINTIME, we show that even bicriteria 
and tricriteria approximations are hard under several 
conditions. We show, however, that if we allow the bud- 
get for number of seeds k to be boosted by a logarith- 
mic factor and allow the coverage to fall short, then the 
problem can be solved exactly in PTIME, i.e., we can 
achieve the required coverage within the time achieved 
by the optimal solution to MINTIME with budget k 
and coverage threshold n. 

Finally, we establish the value of the approximation 
algorithms, by conducting an experimental evaluation, 
comparing their quality against that achieved by vari- 
ous heuristics. 

Keywords Social Networks ■ Social Influence • Influ- 
ence Propagation • Viral Marketing • Approximation 
Analysis • MINTSS • MINTIME 



1 Introduction 



The study of how influence and information prop- 
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tral problems i n this domain i s the problem of influence 
maximization (jKempe et all . 120031 ). Consider a social 
network in which we have accurate estimates of influ- 
ence among users. Suppose we want to launch a new 
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product in the market by targeting a set of influential 
users (e.g., by offering them the product at a discounted 
price), with the goal of starting a word-of- mouth viral 
propagation, exploiting the power of social connectiv- 
ity. The idea is that by observing its neighbors adopting 
the product, or more generally, performing an action, a 
user may be influenced to perform the same action, with 
some probability. Influence thus propagates in steps ac- 
cording to one of the propagation models studied in the 
literature, e.g., the independe nt cascade (IC) or t he lin- 
ear threshold (LT) models ( Kempe et all . 2003 ). The 
propagation stops when no new user gets activated. 

In this context, we can identify three main dimen- 
sions - the number of seed nodes (or users) activated 
at the beginning (known as the budget), the expected 
number of nodes that eventually get activated (known 
as coverage or expected spread]^}, and the number of 
time step s required for the propagation. In thei r semi- 
nal paper iKempe. Kleinberg. and Tardosl (|2003l ) intro- 
duced the problem of Influence Maximization (MAX- 
INF) which asks for a seed set with a budget threshold 
k that maximizes the expected spread (time being left 
unconstrained). They showed that under the standard 
propagation models IC and LT, MAXINF is NP-hard, 
but that a simple greedy algorithm that exploits prop- 
erties of the propagation function yields a (1 — 1/e — <fi)- 
approximation, for any </> > (as discussed in detail in 
Section O. 

In this paper, we explore the other dimensions of 
influence propagation. The problem of Minimum Tar- 
get Set Selection (MINTSS) is motivated by the ob- 
servation that in a viral marketing campaign, we may 
be interested in the smallest budget that will achieve 
a desired outcome. The problem can therefore be de- 
fined as follows. We are given a threshold rj for the 
expected spread and the problem is to find a seed set 
of minimum size such that activating the set yields an 
expected spread of at least rj. 

In both MINTSS and MAXINF, the time for propa- 
gation is not consid ered. Indeed, with the exception of a 
few papers (see e.g., Leskovec et all . 2007 ). the temporal 
dimension of the social propagation phenomenon has 
been largely overlooked. This is surprising as the time- 
liness of a viral marketing campaign is a key ingredi- 
ent for its success. Beyond viral marketing, many other 
applications in time-critical domains can exploit social 
networks as a means of communication to spread in- 
formation quickly. This motivates the problem of Min- 
imum Propagation Time (MINTIME), defined as fol- 
lows: given a budget k and a coverage threshold n, find 
a seed set that satisfies the given budget and achieves 

1 We use the terms coverage and expected spread inter- 
changeably throughout the article. 



the desired coverage in as little time as possible. Thus, 
MINTIME tries to optimize the propagation time re- 
quired to achieve a desired coverage under a given bud- 
get. 



1.1 Our Contributions 

We now summarize the main results in this paper. 

• Firstly, we show (Section 21 Theorem [1} that for all 
instances of MINTSS where the coverage function 
is submodular, a simple greedy algorithm yields a 
bicriteria approximation: given a coverage thresh- 
old i] and a shortfall parameter e > 0, the greedy 
algorithm will produce a solution S: cr(S) > ?] — e 
and |5| < (1 +ln(ri/e))OPT, where OPT is the op- 
timal size of a seed set whose coverage is at least rj. 
That is, the greedy solution exceeds the optimal so- 
lution in terms of size (budget) by a logarithmic 
factor while achieving a coverage that falls short 
of the required coverage by the shortfall parame- 
ter. We prove a generic hardness result (Section 0] 
Theorem [3]) suggesting that improving this approx- 
imation factor is likely to be hard. 

• For MINTIME under IC and LT model (or any 
model with monotone submodular coverage func- 
tions), we show that when we allow the coverage 
achieved to fall short of the threshold and the bud- 
get k for number of seed nodes to be overrun by a 
logarithmic factor, then we can achieve the required 
coverage in the minimum possible propagation time, 
i.e., in the time achieved by the optimal solution to 
MINTIME with budget threshold k and coverage 
threshold ?y (Section [3 Theorem [6|). 

• On the other hand, for MINTIME under the IC 
model, we show that even bicriteria and tricriteria 
approximations are hard. More precisely, let Ropt 
be the optimal propagation time required for achiev- 
ing a coverage > r\ within a budget of k. Then we 
show the following (Section [SJ Theorem U): there is 
unlikely to be a PTIME algorithm that finds a seed 
set with size under the budget, which achieves a cov- 
erage better than (1 — l/e)n. Similarly, if we limit 
the budget overrun factor to less than ln(7y), then 
it is unlikely that there is a PTIME algorithm that 
finds a seed set of size within the overrun budget 
which achieves a coverage better than (1 — \je)r\. 
In both cases, the result holds even when we permit 
any amount of slack in the resulting propagation 
time. 

• The above results are bicriteria bounds, in that they 
allow slack in two of the three parameters govern- 
ing MINTIME problems. We also show a tricriteria 
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hardness result (Section [51 Theorem [5]). Namely, if 
we limit the budget overrun factor to be (3 < ln(?7), 
then it is unlikely that there is a PTIME algorithm 
that finds a seed set with a size within a factor j3 
of the budget that achieves a coverage better than 
(1 — l/e/)rj. Similar bounds hold if we place hard 
limits on the coverage approximation and try to bal- 
ance overrun in the other parameters. 

• Often, the coverage function can be hard to com- 
pute ex actly. This is the case for both IC an d LT 
models ( Kempe. Kleinberg. and Tardosl 2003 ). All 
our results are robust in that they carry over even 
when only estimates of the coverage function are 
available. 

• We show the value of our approximation algorithms 
by experimentally comparing their quality with that 
of several heuristics proposed in other contexts, us- 
ing two real data sets. We discuss our findings in 
Section [5J 

The necessary background is given in Section[2]whilc 
related work is discussed in Section [3] Section [7] con- 
cludes the paper and discusses interesting open prob- 
lems. 



2 Preliminaries 

Suppose we are given a social network together with 
the estimates of mutual influence between individuals 
in the network, and suppose that we want to push a 
new product in the market. The mining problem of in- 
fluence maximization is the following: given such a net- 
work with influence estimates, how to select the set of 
initial users so that they eventually influence the largest 
number of users in the social network. This problem 
has received a good deal of attention in the data min- 
ing and the theoretical computer science communities 
in the last decade. 

The first to consider the propagation of 
influence and the problem of identification 
of influential users from a data min ing per - 



of random variables, where nodes are variables, and 
edges represent dependencies between variables. It 
is adopted in the context of influence propagation 
by modelling only the final state of the network at 
convergence as one large global set of interdependent 
random variables. 



spective are iDomingos and Richardson! (|200lh ; 
Richardson and Domingoa ( 20021) . The problem is 
modelled by means of Markov random fields and 
heuristics are given for choosing the users to target. 
In particular, the marketing objective function to 
maximize is the global expected lift in profit, that is, 
intuitively, the difference between the expected profit 
obtained by employing a marketing strategy and the 
expected profit obtained using no marketing at all. 
A Markov random field, is an undirected graphical 
model representing the joint distribution over a set 



Kempe et all (|2003l) tackle roughly the same prob- 



lem as a problem in discrete optimization. They obtain 
provable approximation guarantees under various prop- 
agation models studied in mathematical sociology, as 
we describe next. 

A social network can be represented as a directed 
graph G = (V, E). Every node is in one of two states - 
active or inactive. Here, "active" may correspond to a 
user buying a product or getting infected. In progres- 
sive models, it is assumed once a node becomes active, 
it remains active. Influence is assumed to propagate 
from nodes to their neighbors according to a propaga- 
tion model, and a node's tendency to become active in- 
creases monotonically as more of its neighbors become 
active. 

In the independent cascade (IC) model, each active 
neighbor v of a node u has one shot at influencing u 
and succeeds with probability p„ lU , the probability with 
which v influences it. In the linear threshold (LT) model, 
each node u is influenced by each neighbor v according 
to a weight b VlU , such that the sum of incoming weights 
to u is no larger than 1. Each node u chooses a threshold 
6 U uniformly at random from the interval [0,1]. If at 
timestamp t, the total weight from the active neighbors 
of u attains the threshold 8 U , then u will become active 
at timestamp t + 1. In both the models, the process 
repeats until no new node becomes active. 

For any propagation model, the expected influence 
spread of a seed set S is the expected number of nodes 
that eventually get activated by initially activating the 
nodes S. We denote this number by o~ m (S), where m 
stands for the underlying propagation model. Then the 
influence maximization problem is defined as follows. 
Given a directed and edge-weighted social graph G = 
(V, E), a propagation model m, and a number k < \V\, 
find a set S C V, \S\ = fc, such that a m (S) is maximum. 

Under both the IC and LT pro pagation models, thi s 
problem is shown to be NP-hard ( Kempe et al . 20031 ). 
However, for both the propagation models described 
above, the expected influence spread function cr m (-) is 
monotone and submodular. Monotonicity says as the 
set of activated nodes grows, the likelihood of a node 
getting activated should not decrease. More precisely, a 
A function / from sets to reals is monotone if f(S) < 
f(T) whenever S C T. A function / is submodular if 
f(Su{w})-f(S) > f{TU{w})-f(T) whenever S C T. 
Submodularity intuitively says an active node's prob- 
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Algorithm 1 Greedy MAXINF 



Input: G,k,a m 
Output: seed set S 



while \S\ < k do 

u <r- arg m&x wev \ s (a m (S U {w}) 



ability of activating some inactive node u does not in- 
crease if more nodes have already attempted to activate 
u and u is hence more "marketing-saturated" . It is also 
called the law of "diminishing returns"^ 

Thanks to these two properties we can have a sim- 
ple greedy algorithm (see Algorithm [T]) for infuence 
maximization which provides an approximation guar- 
antee. In fact, for any monotone submodular function 
/ with /(0) = 0, the problem of finding a set S of size 
k such that f(S) is maximum, can be approximated 
to wi thin a factor of ( 1 — 1 /e) by the greedy algo- 
rithm Nemhauser et all (1978). This resul t carries over 
to the influence maximization problem Kempe et all 
(|2003() . meaning that the seed set we produce using 
Algorithm [1] is guaranteed to have an expected spread 
(1 — 1/e) i.e., > 63%, of the expected spread of the 
optimal seed set. 

The complex step of the greedy algorithm is in line 
3, where we select the node that provides the largest 
marginal gain a m (SU {v}) — a m (S) with respect to the 
expected spread of the current seed set S. Computing 
the expected spread gi ven a seed set is # P-hard un- 
der bot h the IC model rtChen et all . l2010ah and the LT 
model ( Chen et"al l2010bl) . In their paper, Kempe et 
al. run Monte Carlo (MC) simulations of the propaga- 
tion model for sufficiently many times (the authors re- 
port 10, 000 trials) to obtain an accurate estimate of the 
expected spread, resulting in a very long computation 
time. In particular, they show that for any > 0, there 
is a 5 > such that by using (1 + <5)-approximatc values 
of the expected spread, we can obtain a (1 — 1/e — (/>)- 
approximation for the influence maximization problem. 

We now define the problems we study in this paper. 
Let m stand for any propagation model with a submod- 
ular coverage function a m (.). 



Problem 1 (MINTSS) Let G = (V,E) be a social 
graph. Given a real number 77 < |V|, find a set S C V 
of the smallest size \S\, such that the expected spread, 
denoted o~ m (S), is no less than 77. 

Problem 2 (MINTIME) Let G = (V, E) be a social 
graph. Given an integer k, and a real number 77 < \V\, 

2 A variant of the linear threshold model, where a deter- 
ministic threshold 9 U is chosen for each n ode, has also been 
studied (|Chenl I2O0I : iBen-Zwi et all 120091 ). Coverage under 
this variant is not submodular. 



find a set S C V, \S\ < k, and the smallest t € N, such 
that the expected spread at time t, denoted (7^(5), is 
no less than 77. 

The MINTSS problem is closely related to the real- 
valued submodular set cover (RSSC) problem, defined 
as follows: given a submodular function / : 2 X — >■ R 
and a threshold 77, find a set S C X of the least size 
(or minimum cost, when elements of X are weighted) 
such that f(S) > r\. MINTSS under any propagation 
model such as IC and LT, for which the coverage func- 
tion is submodular is clearly a special case of RSSC, an 
observation we exploit in Section SI 

MINTIME is closely related to the Robust Asym- 
metric fc-center (RAKC) problem in directed graphs, 
defined as follows: given a digraph G = (V, E), a (pos- 
sibly empty) set of forbidden nodes and thresholds k 
and 77, find k or fewer nodes S such that they cover 
at least 77 non-forbidden nodes in the minimum possi- 
ble radius, i.e., each of the 77 nodes are reachable from 
some node in S in the minimum possible distance. 



3 Related Work 

While to the best of our knowledge, MINTIME has 
never been studied before , some work has been de- 
voted to MINTSS. IChenl (|2008l ) shows that under 



the LT propagation model with fixed (and hence de- 
terministic) thresholds, MINTSS cannot be approxi- 
mated within a factor of O{2 lo ^ & n ) unless NP C 
DTIME{n°^°^), and also gives a polynomial time 
algorithm for MINTSS on trees. Coverage under the LT 
model with deterministic thresholds is not submodular. 



Ben-Zwi et all (120091 ) build upon IChenl (|2008l ) and 

develop a 0(n 0{w) ) algorithm for solving MINTSS ex- 
actly under the deterministic linear threshold model, 
where w is the tree width of the graph. They show 
the problem cannot be solved in n '^' time unless all 
problems in SNP can be solved in sub-exponential time. 
In this paper, we study both MINTSS and MINTIME 
under the classic propagation models, under which the 
coverage function is submodular. 

A few classical cover-problems are related to the 
problems we study. One such problem is Maximum Cov- 
erage (MC): given a collection of sets S over a ground 
set U and budget k, find a subcollection CCS such 
that \C\ < k and 1 1JC| is maximized. The problem can 
be approximated wit hin a factor of (1 — 1/e) and i t 



cannot be improv ed (iFeigd. 



1998 



Khuller et al Il999h 



Simila r results by iKhuller et a rll999h and ISviridenkol 
1 20041 ) exist for the weighted case. 

Another relevant problem is Partial Set Cover 
(PSC): given a collection of sets S over the ground set 
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U and a threshold 77, the goal is to find a subcollection 
CCS such that \ [JC\ > 77 and |C| is minimized. While 
PSC c an be approximated within a factor of [In rf \ , 
Feigd (1998) showed that it cannot be approximated 
within a factor of (1 — S) In 77, for any fixed 5 > 0, un- 
less NP C DTIME{n 0( - l °s l0 & n )). 

Our results on MINTSS exploit its connection to 
the real- valued submodular set cover (RSSC) problem. 
There has been substantial work on submodular set 
cover (SSC) in the presence of integer-valued submod- 
ular functions, which is a generalization of the classi- 
cal Set Cover Problem dFuiitol. ll999L l200d iFeieel . Il998 : 
Slavi'kl Il997t iBar-Ilan et all200lh . Relatively much less 
work has been done on real-valued SSC. For non- 



Algorithm 2 Greedy-Mintss 



decrea sing real-valued submodular functions, IWolsev 
(| 1982T) has shown, among other things, that a simple 
greedy algorithm yields a solution to a special case 
of SSC where 77 = f(X), that is within a factor of 
In [77/ (77 — f(St-i)} of the optimal solution, where t is the 
number of iterations needed by the greedy algorithm to 
achieve a coverage of 77 and Si denotes the greedy so- 
lution after i iterations. Unfortunately, this result by 
itself docs not yield an approximation algorithm with 
any guaranteed bounds: in Appendix iBl we give an ex- 
ample to show that the greedy solution can be arbitrar- 
ily worse than the optimal one. Furthermore, Wolsey's 
analysis is restricted to the case 77 = f(X). Along the 
way to establishing our results on MINTSS, we show 
the greedy algorithm yields a bicritcria approximation 
for real-valued SSC that extends to the general case of 
partial cover with 77 < f(X), and where elements are 
weighted. 

Our results on MINTIME leverage its connection 
to the robust asymmetric fc-center problem (RAKC). It 
has been shown that, while asymmetric fc-center prob- 
lem can be approximated with i n a fa ctor of 0(log* n) 



(jPanigrahv and Vishwanathanl . I1998T) . RAKC cannot 
be approximated wi t hin a ny factor unless P = NP 
(|Li G0rtz and Wirthl . liooj) . 



4 Minimum Target Set Selection 

4.1 A Bicriteria Approximation 

Our main result of this section is that a simple greedy 
algorithm, Algorithm Greedy-Mintss, yields a bicri- 
teria approximation to (weighted) MINTSS, for any 
propagation model whose coverage function is mono- 
tone and submodular. 

In order to prove the results in the most general 
setting, we consider digraphs G = (V, E) which have 
non-negative node weights: we are given a cost function 
c : V — > M. + in addition to the coverage threshold 77, 



Input: G,v,e,a m 
Output: seed set S 



5^0 
while a 



(S)<V 
u i- argmax„ evxs ( 

s^su{«} 



do 

min(er m (SU{m}),?))- 



,(S). 



and need to find a seed set S such that a m (S) > 77 and 
c(S) = J2xes c ( x ) i s minimum. Clearly, this generalizes 
the unweighted case. 

Theorem 1 Let G = (V, E) be a social graph, with 
node weights given by c : V — > M + . Let m be any prop- 
agation model whose coverage function a m (.) is mono- 
tone and submodular. Let S* be a seed set of minimum 
cost such that cr m (S*) > 77. Let e > be any shortfall 
and let S be the greedy solution with chosen threshold 
77 - e. Then, c(S) < c{S*) • (1 + lnfa/e)). 

In the rest of this section, we prove this result. We 
first observe that every instance of MINTSS where the 
coverage function <r TO (.) is monotone and submodular 
is an instance of RSSC. Thus, it suffices to prove The- 
orem [T] for RSSC, for whi ch we adapt a bicriterion ap- 
proximation technique by ISlavi'k ( 1997 ) . 

Let X = {xi, x%, x rn } be a ground set, c : X— >-R + 



be a cost function, / : 2 — a non- negative mono- 
tone submodular function and 77 a given threshold. 
Apply the greedy algorithm above to this instance of 
RSSC. Let Si be the (partial) solution obtained by the 
greedy algorithm after i iterations. Let t be the small- 
est number such that f(St) > 77. We define g(S) = 
min(/(S l ), 77). Clearly, g is also monotone and submod- 
ular. In each iteration, the greedy algorithm picks an 
element which provides the maximum marginal gain 
per unit cost (w.r.t. g), i.e., it picks an element x for 
which g ( su { x p g t ,s ') jg positive and is maximum. 

Let c(S*) = K and define 77^ = 77 — g(Si), i.e., the 
shortfall in coverage after i iterations of the greedy al- 
gorithm. 

Lemma 1 At the end of iteration i, there is an element 
xeX\S t : ^ u %\-^ >f. 

Proof. Let S* = S* - S,. Let S* = {y u ...,y t } and 



c(S*) — Ki- Suppose \/x <E X \ Si 



g(SiU{x})-g(Si) < r. 



c ( x ) 



Consider adding the elements in S* to Si one by one. 
Clearly, at any step j < t, we have by submodularity 
that 

g(Si U {yi, yj}) - g(Si U {y u yj-i}) 

<g(SiU{ yj })-g{Si)<c( yj )-^ 



(i 



Iterating over all j, this yields g(Si U {yi,— ,Vj}) ~ 
g(Si) < f • (c(y{) + ... + c{y j )) resulting in g(S l U 

{yi,...,y t }) < g{Si) + f- • Ei<j<t c (yj) < V wnich is 
a contradiction since the left hand side is no less than 
the optimal coverage. □ 

Proof of Theorem [1] 

It follows from Lemma [1] that 77.; < rji_i(l — Ci/n) 
where is the cost of the element added in iteration 
i. Using the well known inequality (1 + z) < e z ,Vz, we 
get r\i < iji-i ■ e~ Ci / k . Expanding, rji < 77 ■ e"'^ iCi . 
Let the algorithm take I iterations to achieve coverage 
g(Si) > i] — e such that g(Si-i) < 77 — e. At any step, 
g(Si+i) — g{Si) < r/i. Thus, Cj < k, and in particular, 
the cost of the last element picked can be at most k. So, 
c(Si) < k + c(Si-i). g(Si-{) < r) — e implies T7;_i > e. 
Hence, we have rje - ^^ 8 '- 1 ^ > e which implies c(Si-i) < 
Kln(?7/e). Thus, c(5/) < k{1 + Mrj/e)). □ 

Using a similar analysis, it can be shown that when 
the costs are uniform, the approximation factor can be 
improved to [ln(?7/e)]. 

For propagation models like IC and L T, comput- 
ing the coverage a m (S) exactly is #P-hard IjChen et all 
2010aibh and thus we must settle for estimates. To ad- 
dress this, we "lift" the above theorem to the case where 
only estimates of the function /(.) are available. We can 
show: 



Theorem 2 For any <j> > 0, there exists a 8 £ (0, 1) 
such that using (1 — 6 )- approximate values for the cover- 
age function o~ m (), the greedy algorithm approximates 
MINTSS under IC and LT models within a factor of 
(l + 0)-(l + hi(77/e)). 

Proof. The proof involves a more careful analysis of 
how error propagates in the greedy algorithm if, be- 
cause of errors, the greedy algorithm picks the wrong 
point. 

Here, we give the proof for the unit cost version 
only. Consider any monotone, submodular function 
/(•). Thus, in the statement of theorem, cr m (-) = /(■). 
Let /'(•) be its approximated value. In any iteration, 
the (standard) greedy algorithm picks an element which 
provides maximum marginal gain. Let Si be the set 
formed after iteration i. 

As we did in Lemma [TJ it is straightforward to show 
that there must exists an clement x G X \ Si such that 
f(Si U {x}) - f(Si) > 77,/fc where 77, = n - f'(Si). 
Without loss of generality, let x be the element which 
provides the maximum marginal gain. Suppose that due 
to the error in computing /(.), some other element y is 
picked instead. Then, 

(1 - 6)f(Si u M) < f'(Si U {x}) < f(Si u M) 



Moreover, f'(Si) < f(S t ). Thus, 



\ < w u - f(s t ) < ns ^l y}) m) 



< m ■ (1 - <*) • ( 1 - £ ) + & ■ v 



Vi+i < V ■ (! - 8) 



i+i 



1 - 



i+l 



+ 6 ■ 77 



1 - (1 - 6) i+1 {l - l/k) l+1 
1 - (1 - «5)(1 - 1/fc) 



Let 6' = 6/(1 - (1 - 6)(1 - 1/k)). Let the greedy 
algorithm takes I iterations. Then, 



m < v ■ (1 - sy ■ [ 1 - - 



+ 6'- n - \i-{i-5) 1 - [1-- 

= r,.(l-8) l -(l-^j (l-8') + 8'-r, 

Using (1 - 5) 1 < 1 and (1 - 1/k) 1 < e~ l / k , 

Vi <v^ l/k (l- 6') + 6' -n 

The algorithm stops when 77/ < e. The maximum 
number of iterations needed to ensure this are 



I < k 1 + In 



VQ- ~ 5') 



Let x = rj/e. To prove the lemma, we need to prove 
that for any 4> > 0, there exists 5 € [0, 1) such that 



x 1 ^ = x 



1-6' 
1-5' a 



6' = 



x* - 1 



Clearly, for any <f) > 0, 8' € [0, 1). Hence, 

< 5 < 1 - (1 - 8)(1 - 1/k) 
^0 <6 <1 

This completes the proof for unit cost case. Using 
the slight modification in the greedy algorithm (as we 
did in proving theorem 1), the same result can be ob- 
tained for weighted version. □ 
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4.2 An Inapproximability Result 

Recall that every instance of MINTSS where the cov- 
erage function is monotone and submodular is an in- 
stance of RSSC. Consider the unweighted version of the 
RSSC problem. Let S* denote an optimal solution and 
let OPT = \S*\. 

Theorem 3 For any fixed 5 > ; there does not exist a 
PTIME algorithm for RSSC that guarantees a solution 
S :\S\< OPT (I - S) ln(r//e), and f(S) > r\ - e for any 
e > unless NP C DTIME{n°^ l °^). 

Proof. Case 1: e > 1. Suppose there exists an algo- 
rithm A that finds a solution S of size < OPT(l — 
5)\n(r//e) such that f(S) > n — e for any e > 1. 
Consider an arbitrary instance T = (U,S,n) of PSC, 
which is a special case of RSSC. Apply the algorithm 
A to I. It outputs a collection of sets C\ : \C\\ < 
OPT(\ — 6)\n(r//e) that covers > 77 — e elements in 
U. 

Create a new instance J = (l4',S',r)') of PSC as 
follows. Let T — 1J C\ be the set of elements of U cov- 
ered by d. Define S' = {S\T \ S e S\d}, W =U\T 
and rj' = e. Set the new shortfall e' = 1. Apply the 
algorithm A to J . It will output another collection of 
sets C 2 : \C 2 \ < OPT {I - 5) In e which covers > e - 1 el- 
ements in U' |f| Let C — C\ U C 2 • The number of elements 
covered by C is > 77 — e + e— 1 = 77 — 1. Clearly, \C\ = 
\Ci\ + |C 2 | < OPT{\ - 5)\n{n/e) + OPT{\ - <S)ln(e) = 
OPT(l - 6) 111(77). Thus, we have a solution for PSC 
with the approximation factor of (1 — 5) 111(77), w hich is 
not p ossible unless NP C DTIME{n ^ ^) (|Feigel . 
19981 ). This proves Case 1. 

Case 2: e < 1. Assume an arbitrary instance 
I of RSSC with monotone submodular function / : 
2 X — > R. Let 77' be the coverage threshold and e' > 1 
be any given shortfall. We now construct another in- 
stance J of RSSC as follows: Set the coverage func- 
tion g(S) = f(S)/x, coverage threshold 77 = r/'/x 
and shortfall e = e'/x. Choose any value of x > 1 
such that e = e'/x < 1. We now show that if a so- 
lution is a (1 — 5) m(77/e)-approximation to the op- 
timal solution for J then it is a (1 — i5) ln(?7'/e')- 
approximation to the optimal solution for T. Clearly, 
the optimal solution for both the instances are identi- 
cal, so OPT T = OPTjE Suppose there exists an al- 
gorithm for RSSC when the shortfall is e € (0, 1), that 
guarantees a solution S : \S\ < OPT(l - 5)\n{r]/e) 
and f(S) > rj — e. Apply this algorithm to instance J 
to obtain a solution Sj. We have: g{Sj) >?/ — £ = 

3 If e = 1, A outputs an empty collection. 

4 Here, OPTx and OPT j represent the size of the optimal 
solution for instances X and J respectively. 



(r)'-e')/x. It implies /(S>) = x-g(Sj) > rj'-e 1 . More- 
over, \Sj\ < OPTj(l-6)ln( V /e), implying \Sj\ < 
OPT T {l - 5)\n{r)' /e 1 ). Thus we have the solution Sj 
for instance I whose size is < OPTx(\ — 5)hi(rf /e'). 
The theorem follows. □ 
In view of this generic result, we conjecture that 
improving the approximation factor for MINTSS to (1 — 
5) ln(77/e) for IC and LT is likely to be hard. 



5 MINTIME 

In this section, we study MINTIME under the IC 
model. Denote by o-^(S) the expected number of nodes 
activated under model m within time R, and let 77 be 
the desired coverage and k be the desired budget. Let 
Ropt denote the optimal propagation time under these 
budget and coverage constraints. Our first result says 
that efficient approximation algorithms are unlikely to 
exist under two scenarios: (i) when we allow a cover- 
age shortfall of less than 77/e and (ii) when we allow a 
budget overrun less than In 77. In the former scenario, 
we have a strict budget threshold and in the latter we 
have a strict coverage threshold. In both cases, we allow 
any amount of slack in propagation time. 

Theorem 4 Unless NP C DTIME{n°^ l °^), 
there does not exist a PTIME algorithm for MINTIME 
that guarantees (for any a > 1): 

1. a (a approximation, such that \S\ < k, R = a ■ 
Ropt and o-^(S) > 7 • 77 where 7 = (1 — 1/e + 5) 
for any fixed S > 0; or 

2. a (a, (3)- approximation, such that \S\ < ■ k, R = 
a ■ Ropt and cr^S) > 77 where (3 = (1 — 6) In 77 for 
any fixed S > . 

Our second theorem says efficient approximation al- 
gorithms are unlikely to exist under more liberal scenar- 
ios than those given above: (i) when for a given budget 
overrun factor /3 < 77, the fraction of the coverage we 
want to achieve is more than 1 — 1/e' 3 and (ii) when 
for a given fraction 7 g (0, 1 — l/rj\ of the coverage we 
want to achieve, the budget overrun factor we allow is 
less than ln(l/(l — 7)). As before, we allow any amount 
of slack in propagation time. 

Theorem 5 Unless NP C DTIME{n°^°^) 
there does not exist a PTIME algorithm for MINTIME 
that guarantees (a, (3,"f)- approximation factor (for any 
a > 1 ) such that \S\ < j3 ■ k, R = a ■ Ropt o,nd 
°~m{S) > 7 ■ 77 where 



1. P £ [1, In 77) and 7 
or 



1 — +5 for any fixed S > 0; 
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2. j e (o, 1 - 

fixed 8 > 0. 



and P = (1 - 5) In (j^) for any 



Finally, on the positive side, we show that when 
a coverage shortfall of e > is allowed and a budget 
boost of (1 + ln(?7/e)) is allowed, we can in PTIME find 
a solution which achieves the relaxed coverage under 
the relaxed budget in optimal propagation time. More 
precisely, we have: 

Theorem 6 Let the chosen coverage threshold berj — e, 
for e > and chosen budget threshold be k(l + ln(?y/e)). 
If the coverage function cr^(-) can be computed exactly, 
then there is a greedy algorithm that approximates the 
MIN TIME problem within a (a,/3,7) factor where a = 
1, j3 = 1 + ln(?7/e) and 7 = 1 — e/77 for any e > 0. 
Furthermore, for every <p > 0, there is a 8 > such that 
by using a (1 — 5) -approximate values for the coverage 
function cr„(-)j the greedy algorithm approximates the 
MINTIME problem within a (a, /3, 7) factor where a = 
1, = (1 + 4>){1 + ln(r//e)) and 7 = 1 - e/77. 



5.1 Inapproximability Proofs 

We next prove Theorems @] and [S] We first show that 
MINTIME under the IC model generalizes the RAKC 
problem. In a digraph G = (V, E) and sets of nodes 
S, T C V, say that R-covers T if for every i/£T, there 
is a x £ S such that there is a path of length < R from 
x to y. Given an instance of RAKC, create an instance 
of MINTIME by labeling each arc in the digraph with 
a probability 1. Now, it is easy to see that for any set 
of nodes S and any < R < n — 1, S R-covers a set 
of nodes T iff activating the seed nodes S will result 
in the set of nodes T being activated within R time 
steps. Notice that since all the arcs are labeled with 
probability 1, all influence attempts are successful by 
construction. It follows that RAKC is a special case of 
MINTIME under the IC model. 

The tricriteria inapproximability results of Theo- 
rem [5] subsume the bicriteria inapproximability results 
of Theorem [4] Still, in our presentation, we find it 
convenient to develop the proofs first for bicriteria. 
Since we showed that MINTIME under IC generalizes 
RAKC, it suffices to prove the the orems in the context 
of RA KC. It is worth pointing out lLi G0rtz and Wirth 
(2006) proved that it is hard to approximate RAKC 
within any factor unless P = NP. Their proof only 
applies to (the standard) unicriterion approximation. 

For a set of nodes S in a digraph we denote by f R (S) 
the number of nodes that are i?-covered by 5*. Recall 
the problems MC and PSC (see Section [3]). 



Proof of Theorem |4j It suffices to prove the theorem 
for RAKC. For claim 1, we reduce Maximum Cover- 
age (MC) to RAKC and for claim 2, we reduce PSC to 
RAKC. The reduction is similar and is as follows: Con- 
sider an instance of the decision version of MC (equiv- 
alently PSC) X = (U, S, k, rj), where we ask whether 
there exists a subcollection C C S of size < k such that 
I Uses ^ — Construct an instance J = (Q, fc',?/) of 
RAKC as follows: the graph Q consists of two classes of 
nodes - A and B. For each S £ S, create a class A node 
vs and for each u s U, create a class B node v u . There 
is a directed edge (vs, v u ) of unit length iff u G S. No- 
tice, a set of nodes S in Q R-covers another non-empty 
set of nodes iff S 1-covers the latter set. Moreover, x 
sets in S cover y elements in U iff Q has a set of a; 
nodes which 1-covers y + x nodes. The only-if direction 
is trivial. For the if direction, the only way x nodes can 
1-covcrs y + x nodes in Q is when the x nodes are from 
class A. 

Next, we prove the first claim. Set k' = k and 
rj = i] + k. Assume there exists a PTIME (a, 
7)-approximation algorithm A for RAKC such that 
f H (S) > (1 - 1/e + 5) ■ (?/) for any fixed 5 > 0, 
for some R < o.Ropt- Apply algorithm A to the in- 
stance J. Notice, for our instance, Ropt = 1- The 
coverage by the output seed set S will be f R (S) > 
(1 — 1/e + S) ■ (rj + k) nodes, for some R < a ■ 1, 
implying that the number of class B nodes covered is 
> (l-l/e+6)-(r)+k)-k= {l-l/e+d-(l/e-S)k/r])T]. 
Thus the algorithm approximates MC within a factor 
of (l-l + S-(l-S) i). Let* = *-(!-*) 4. H 

we show 5' > 0, we are done, since MC cannot be ap- 
proximated within a factor of (1 — 1/e + 5') for any 
5' > unless NP C DTIME{n 0{lo ^°^) ljFeigeLll998 : 



Khuller et all . ll999h . Clearly, 5' is not always positive. 



However, for a given 8 and k, 8' is an increasing func- 
tion of 77 and reaches 8 in the limit. Hence there is a 
value 770 : V77 > jyo, 8' > 0. That is, there are infinitely 
many instances of PSC for which A is a (1 — 1/e + 8')- 
approximation algorithm, where 5' > 0, which proves 
the first claim. 

Next, we prove the second claim. Set k' = k and 
rj = n + x. The value of x will be decided later. Assume 
there exists a PTIME (a, /^-approximation algorithm 
A for RAKC where /3 = (1 - 5) 111(77') for any fixed 
8 > 0. Apply the algorithm to J . It gives a solution S 
such that \S\ < k ■ (1 — 8) 111(77 + x ) that covers > 77 + x 
nodes. A difficulty arises here since 8 can be arbitrarily 
close to 1 making k ■ (1 — 8) ln(n + x) arbitrarily small, 
for any given 77 and k. However, as we argued in the 
proof of claim 1, for sufficiently large 77, we can always 
find &n x: k < x < k ■ (1 — 8)ln(r) + x). That is, on 
infinitely many instances of PSC, algorithm A finds a 
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set of \S\ class A nodes which i?-covers 77 + x nodes, 
for some R < a ■ 1. Without loss of generality, we can 
assume x < 77. Choose the smallest value of x such that 
the solution S covers > 77 class B nodes. This implies 
the number of class A nodes covered is < x and so 
\S\ < x. Thus, on all such instances, algorithm A gives 
a solution S of size < x: k < x < k- (1 — 8) \n(i] + x) that 
covers > 77 nodes. If we show that the upper bound is 
equal to k ■ (1 — 8') In r\ for some 8' > 0, we are done, 
since PSC cannot be approximated within a fa ctor of 
(1- 8') Inn unless NP C DTIME(n°^ l °^) (|Feigel . 
1998h . 

Let (1 — 8') In 77 = (1 — 8) 111(77 + x )> which yields 
8' = 1 — (1 — 8) ln (^ +z ) . It is easy to see that by choosing 
sufficiently large 77, we can make the gap between 5 and 
5' arbitrarily small and thus can always ensure 8' > 
on infinitely many instances of PSC, on each of which 
algorithm A will serve as an (1 — 8') In 77- approximation 
algorithm proving claim 2. □ 

Note, in the proofs of both claim 1 and 2 in the 
above theorem, by choosing 77 sufficiently large, we can 
always ensure for any given k and 8 > 0, the corre- 
sponding 8' is always greater than 0. To prove the tri- 
critcria hardness results, we need the following lemma. 

Lemma 2 In the MC (or PSC) problem, let k be the 
minimum number of sets needed to cover > rj elements. 
Then, unless NP C DTIME(n°^°^), there does 
not exist a PTIME algorithm that is guaranteed to select 
/3k sets covering > 777 elements where 

1. /3 e [Lin?/ 1 ) an d 7 > 1 - l/e /3 ; or 



2. 7 e 

fixed 8 > 



id (3 = (1 - 8)\n(j^-\ for any 



Lemma [2] is proved in Appendix [21 We are ready to 
prove Theorem [5] 

Proof of Theorem [5j Again, it suffices to prove the 
theorem for RAKC. For claim 1, we reduce MC to 
RAKC and for claim 2, we reduce PSC to RAKC. The 
reduction is the same as in the proof of Theorem U and 
we skip the details here. Below, we refer to instances I 
and J as in that proof. 

We first prove claim 1. Given any /3, set k' = k 
and 77' = 77 + /3k. Assume there exists a PTIME (a, 
(3, 7)-approximation algorithm A for RAKC which ap- 
proximates the problem within the factors as men- 
tioned in claim 1. Apply algorithm A to the instance 
J. The coverage by the output seed set S will be 
f R (S) > (1 - lie? + 8) ■ (77 + /3k) nodes, implying the 
number of class B nodes covered is > (1 — 1/e* + 8) ■ 
(77 + f3k) - f3k = (1 - Ije? + 8 - (1/e' 3 - 8)/3k/r))rj. 
Thus the algorithm approximates MC within a factor 



If we show 8 — (jjp ~ 8) ^ > 0, then the claim fol- 
lows, since MC cannot be approximated within a fac- 
tor of (1 - 1/e? + 8') for any 8' > unless NP C 
DTIME(n 0{ ~ l0 ^°^), by Lemma [2j Let 8' = 8 - 
—8) For any f3 G [l,ln?7), 8' is an increasing 
function of 77 which approaches 8 in the limit. Thus, 
given any fixed 8 > 0, there must exist some 770 such 
that for any rj > n Q , 8' > 0. This proves the first claim 
(by an argument similar to that in Theorem 0]). 

Next, we prove the second claim. Set k' = k and 
77' = 77 + x. The value of x will be decided later. Assume 
that there exists a PTIME (a, j3, 7)-approximation al- 
gorithm A for RAKC where the factors a, j3 and 7 sat- 
isfy the conditions as mentioned in claim 2. Apply the 
algorithm to instance J . For any 7^ £ (0, 1 — 1/(77 + x)] , 
it gives a solution of size < k ■ (1 — <5)ln(l/(l — 7^)) 
that covers jj ■ (ij + x) nodes. There can be \S\ possible 
choices of x. Pick the smallest x such that number of 
nodes covered in class B is at least 7^77, implying that 
the number of nodes picked from class A is jjX. Thus, 
jjX < k ■ (1 — 8) ln(l/(l — 7j)). The existence of a; sat- 
isfying this inequality can be established as done for 
claim 2 in Theorem 01 

Thus, algorithm A gives the solution instance X of 
size < k- (1 — 8) ln(l/(l — 7^)) that covers 7^ 77 elements 
in U where 7^ £ (0, 1 — 1/(77 + x ))- If we show that for 
any given 8 > and in the range, there exists some 
8' > and 7* e (0, 1 — I/77] such that 7^77 > 7,(77 + x) 
and (l-^)ln(l/(l- 7i )) = (1 -8) ln(l/(l - 7j )), then 



hi 



1-7, 



/( 



hi 



l-7i 



, then 



the claim follows. Let Z = 
8' = 1- (l-S)Z. 

Whenever 7, < 1 — I/77, we can always choose 7, > 
7j such that 8' > 0. The non-trivial case is when jj £ 
(1 — I/77, 1 — 1/(77 + a;)]. In this case, by choosing a large 
enough 77, we can make Z arbitrarily close to 1 and 
make 8' > 0. In other words, there exists some 770: for 
all 77 > 770, 8' > 0, and by an argument similar to that 
for claim 2 in Theorem 21 the claim follows. □ 



5.2 A Tri-criteria Approximation 

We now consider upper bounds for MINTIME. It is in- 
teresting to ask what happens when either the budget 
overrun or the coverage shortfall is increased. We show 
that under these conditions, a greedy strategy combined 
with linear search yields a solution with optimal prop- 
agation time. This proves Theorem [SJ 

Algorithm Greedy-Mintss computes a small seed 
set S that achieves coverage a m (S) = t] — e. Recall 
that o-^(S) denotes the coverage of S under propaga- 
tion model to within R time steps. It is easy to see that 
Greedy-Mintss can be adapted to instead compute a 
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seed set that yields coverage 77 — e within R time steps: 
we call this algorithm Greedy-Mintss^. 

Given such an algorithm, a simple linear search over 
R = . . .n— 1 yields the bounds specified in Theorcm|6l 
after setting coverage threshold as r\ — e and the chosen 
budget threshold as fruiig = fc(l +ln(7y/e)). The approx- 
imation factors in the theorem follow from Theorem Q] 
and LcmmaO These bounds continue to hold if we can 
only provide estimates for the coverage function (rather 
than computing it exactly) and also extend to weighted 
nodes. 

We conclude this section by noting that the algo- 
rithm above can be naturally adapted to the RAKC 
problem. The bounds in Theorem [6] apply to RAKC as 
well, since MINTIME under IC generalizes RAKC. 





NetHEPT 


Meme 


#Nodes 


15233 


7418 


#Arcs 


62794 


39170 


Avg.degree 


4.12 


5.28 


#CC (strong) 


1781 


4552 


maxCC (strong) 


6794 (44.601%) 


2851 (38.434%) 


clustering coefficient 


0.31372 


0.06763 



Table 1 Networks statistics: number of nodes and directed 
arcs with non-null probability, average degree, number of 
(strongly) connected components, size of the largest one, and 
clustering coefficient. 



Note that WC is a special case of IC where probabili- 
ties on arcs are not necessarily uniform. 



6 Empirical Assessment 

We conducted several experiments to assess the value of 
the approximation algorithms by comparing their qual- 
ity against that achieved by several well-known heuris- 
tics, as well as against the state-of-the-art methods de- 
veloped for MAXINF that we adapt in order to deal 
with MINTSS and MINTIME. In particular, the goals 
of experimental evaluation are two-fold. First, we have 
previously established from theoretical analysis that 
the Greedy algorithm (Greedy-Mintss for MINTSS 
and Greedy-Mintss^ for MINTIME) provides the 
best possible solution that can be obtained in PTIME, 
which we would like to validate empirically Second, we 
study the gap between the solutions obtained from var- 
ious heuristics against the Greedy algorithm, the upper 
bound, in terms of quality. 

In what follows we assume the IC propagation 
model. 

Datasets, probabilities and methods used. We use 

two real- world networks, whose statistics are reported 
in Tabled! 

The first network, called NetHEPT, is the same used 
in Chen et al (2009, 20 10albl ). It is an academic collab- 



Random 
High Degree 

Page Rank 



oration network extracted from "High Energy Physics 
- Theory" section of arXi-Jf], with nodes representing 
authors and edges representing coauthorship. This is 
clearly an undirected graph, but we consider it directed 
by taking for e ach edge the arcs i n both the direc- 
tion^ Following [K^in£e_eLa]| (|2003l) ; IChen et all |2009l 
2010al) . we assign probabilities to the arcs in two dif- 
ferent ways: uniform, where each arc has probability 

0. 1 (or probability 0.01) and weighted cascade (WC), 

1. e, the probability of an arc (v,u ) is p v . u = l/dj n (u) , 
where di n (-) indicates in-degree ( Kempe et al 120031 ). 



Sp 



Pmia 



Greedy 



Simply add nodes at random to the seed set, 
until the stopping condition is met. 



Greedily add the highest degree node to the 
seed set, until the stopping condition is met. 



The popular index of nodes' importance. 
We run it with the same setting used 
in Chen et al (2010a'). 



The shortest-path based heuristic for the 
greedy algorithm introduced in 
Kimura and Saito (2006 ). 



The maximum influence arborescence 
method of Chen et al (2010a ') with 
parameter 9 = 1/320. 



Algorithm Greedy-Mintss for MINTSS 
and Algorithm Greedy-Mintss^ for 
MINTIME. 



Table 2 The methods used in our experiments. 

The second one, called Meme, is a sample of the so- 
cial network underlying the Yahoo! Memqj microblog- 
ging platform. Nodes are users, and directed arcs from 
a node u to a node v indicate that v "follows" u. For 
this dataset, we also have the log of posts propagations 
during 2009. We sampled a connected sub-graph of the 
social network containing the users that participated 
in the most re-posted items. The availability of posts 
propagations is significant since it allows us to directly 
estimate actual influence. 

In particular, here a propagation is defined based 
on reposts: a user posts a meme, and if other users 
like it, they repost it, thus creating cascades. For each 
meme m and for each user u, we know exactly from 
which other user she reposted, that is we have a relation 
repost(u, v, m, t) where t is the time at which the repost 
occurs, and v is the user from which the information 
flowed to user u. The maximum likelihood estimator of 
the probability of influence corresponding to an arc is 
Pv,u = M V 2u/M vu where M vu denotes the number of 
memes that v posted before u, and M v i u denotes the 
number of memes m such that repost(u,v,m,t). 
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Fig. 1 Experimental results on MINTSS. 



For the sake of comparison, we adapt the state- 
of-the-art methods developed for MAXINF (also see 
Section |3J) to deal with MINTSS and MINTIME. For 
most of the techniques the adaptation is straightfor- 
ward. The methods that we use in the experimentation 
are succinctly summarized in Tabled It is noteworthy 
that PMIA is one of the state-of-the-art heuristic al- 
go rithms proposed fo r MAXINF under the IC model 
by I Chen et all (|2010al ). In all our experiments, we run 
10,000 Monte Carlo simulations for estimating cover- 
age. 

MINTSS - Our experimental results on the MINTSS 
problem are reported in Figure [TJ In each of the three 
plots, we report, for a given coverage threshold (x-axis), 
the minimum size of a seed-set (budget, reported on 
y-axis) achieving such coverage. As Greedy provides 
the upper bound on the quality that can be achieved 
in PTIME, in all the experiments it outperforms the 
other methods, with Random and High Degree con- 
sistently performing the worst. 

We analyzed the probability distributions of the var- 
ious data sets we experimented with. At one extreme 
is the model with uniformly low probabilities (0.01). In 
Meme, about 80% of the probabilities arc < 0.05. In 
NetHEPT WC, on the other hand, approximately 83% 
of the probabilities arc > 0.05 and about 66% of the 
probabilities are > 0.1. However, the combination of 
a power law distribution of node degrees in NetHEPT 
together with assignment of low probabilities for high 
degree nodes (since it's the reciprocal of in-degree) has 
the effect of rendering central nodes act as poor influ- 
ence spreaders. And the arcs with high influence proba- 
bility are precisely those that are incident to nodes with 
a very low degree. This makes for a low influence graph 
overall, i.e., propagation of influence is limited. Finally, 
at the other extreme is the model with uniformly high 
probabilities (0.1) which corresponds to a high influence 
graph. 



We tested uniformly low probabilities (0.01), and 
we observed that with such low probabilities, there is 
limited propagation happening: for instance, in order 
to achieve a coverage of 150, even the best method re- 
quires more than 100 seeds. This forces the quality of 
all algorithms to look similar. 

On data sets where there is a non-uniform mix of low 
and high probabilities, but the probabilities being pre- 
dominantly low, as well as on data sets cor responding 
to low i nfluence graphs, the Pmia m ethod of Chen et al 



2010a) and the Sp method of iKimura and Saito 
2006h . originally developed as efficient heuristics for 



the MAXINF problem, when adapted to the MINTSS 
problem, continue to provide a good approximation of 
the results achieved by the Greedy algorithm (Fig- 
ure [Ha), (c)). In these situations, the Random and 
HighDcgrcc heuristics provide seed sets much larger 
than Greedy. In NetHEPT WC (Figure Ufa)), PageR- 
ank has a performance that is close to the Greedy so- 
lution, while in Meme(Figurc [TJc)), the seed set gen- 
erated by PageRank is much larger than Greedy. In 
data sets with uniformly high probabilities (0.1), the 
gap between between Greedy and other heuristics is 
substantial (FigurcQJb)). Greedy can achieve a target 
coverage r\ = 750, with just 5 seeds, while Pmia and Sp 
need 35 and 21 seeds respectively; similarly Greedy 
can achieve a target coverage r\ = 1000, with just 58 
seeds, while Pmia and Sp need 117 and 90 seeds re- 
spectively. It is worth noting that Random, HighDe- 
gree, and the PageRank heuristic all generate seed sets 
much larger than Greedy on this data set. To sum, the 
gap between the sizes of the seed sets obtained by the 
heuristics one the one hand and the Greedy algorithm 
on the other, varies depending on the influence prob- 
abilities on the edges. In general, on graphs with high 
influence, the gap can be substantial. 

MINTIME - Our experimental results on the 
MINTIME problem are reported in Figures [5] and [3J In 
Figure [21 we report, for a coverage threshold given on 
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Coverage Threshold Coverage Threshold Coverage Threshold 

(a) NetHEPT- WC, Budget=75 (b) NetHEPT- Uniform, Budget=75 (c) Meme, Budget=150 



Fig. 2 Experimental results on MINTIME with fixed budget. 
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Fig. 3 Experimental results on MINTIME with fixed Coverage Threshold. 
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the x-axis, and a fixed budget (75 for NetHEPT, 150 for 
Meme), the minimum time steps needed to achieve such 
coverage with the given budget (y-axis). As expected, 
Greedy outperforms all the heuristics. All the plots 
show that after a certain time, there is no further gain 
in the coverage, indicating the influence decays over 
time. Figure [Ha) compares the various heuristics with 
the Greedy on the NetHEPT dataset under WC model. 
On this data set, Pmia, Sp and Greedy exhibit com- 
parable performance. The Pagerank heuristic comes 
close to them. 

Figure [21b) shows the results for the NetHEPT 
dataset under IC model with uniform probability 0.1. 
Here, Greedy outperforms all the other heuristics. For 
instance, when coverage threshold ry is 900 and budget 
is 75, Greedy achieves the coverage in 5 time steps, 
and Sp in 6 time steps, Pmia in 14 time steps. Ran- 
dom, High Degree and Pagerank fail to find a so- 
lution. Similarly, when coverage threshold is 1000 and 
budget is 75, Greedy achieves the coverage in 6 steps 
whereas all other heuristics fail to find a solution with 
this coverage. 

Finally, Figure [2jc) shows the results on Meme 
dataset. As we increase the target coverage, the other 
heuristics fail to give a solution, one by one. Beyond 
7/ = 1600, all but Sp, and Pmia fail and beyond 



r] = 2000, all but Pmia fail. On this data set, Pmia 
provides a good approximation to the performance of 
Greedy. 

In Figure^ we fix the coverage threshold (77 = 1000 
for all the plots). The plots show the minimum time 
steps needed to achieve the coverage w.r.t. different seed 
set sizes (budget). In all the cases, Random fails to 
find a solution and hence is not shown in the plots. The 
performance of the High Degree algorithm is poor as 
well and it fails to find a solution in case of NetHEPT 
with uniform probabilities 0.1. As expected, Greedy 
outperforms all the heuristics and provides us the lower 
bound on time needed to achieve the required coverage 
with a given budget. 

Overall, we notice that the performance quality of 
all other heuristics compared to Greedy follows a sim- 
ilar pattern to that observed in case of MINTSS: as 
the graph changes from a low influence graph to a high 
influence graph, the heuristics' performance drops sub- 
stantially compared to Greedy. 

Another key takeaway from the MINTIME plots is 
the following. For a given budget, as observed above, 
the choice of the seed set plays a key role in determin- 
ing whether a given coverage threshold can be reached 
or not, no matter how much time we allow for the influ- 
ence to propagate. Even if the given coverage threshold 
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is achieved, the choice of the seed set can make a big 
difference to the number of time steps in which the 
coverage threshold is reached. Often, for a given bud- 
get, relaxing the coverage threshold can dramatically 
change the propagation time. E.g., In Figurc[2ja) (bud- 
get fixed to 75), while Greedy takes 8 time steps to 
achieve a coverage of 1200, when we relax the thresh- 
old to 1100, the propagation time decreases by 50%, 
that is, to just 4 time steps. A similar phenomenon 
is observed when the budget is boosted w.r.t. a fixed 
coverage threshold. For instance, in Figure EJc), while 
using 15 seeds, Greedy takes 6 time steps to achieve 
a coverage of 1000, it achieves the same coverage by 
30 seeds in 33% of the time, that is, in 2 time steps. 
These findings further highlight the importance of the 
MINTIME problem. 

7 Conclusions 

In this paper, we study two optimization problems in 
social influence propagation: MINTSS and MINTIME. 
We present a bicriteria approximation for MINTSS 
which delivers a seed set larger than the optimal seed 
set by a logarithmic factor (1 + ln(ry/e)), that achieves 
a coverage of r) — e, which falls short of the coverage 
threshold by e. We also show a generic tightness result 
that indicates improving the above approximation fac- 
tor is likely to be hard. 

Turning to MINTIME, we give a greedy algorithm 
that provides a tricriteria approximation when allowed 
a budget overrun by a factor of (1 + ln(^/e)) and a cov- 
erage shortfall by e, and achieves the optimal propaga- 
tion time under these conditions. We also provide hard- 
ness results for this problem. We conduct experiments 
on two real-world networks to compare the quality of 
various popular heuristics proposed in a different con- 
text (with necessary adaptations) with that the greedy 
approximation algorithms. Our results show that the 
greedy algorithms outperform the other methods in all 
the settings (as expected) but depending on the charac- 
teristics of the data, some of the heuristics perform com- 
petitively. Tlie^e_2nclude_tlie recentl y proposed heuris- 
tics P mia Chen et al ( 2010ah and Sp Kimura and Saitol 
(|2006h which we adapted to MINTSS and MINTIME. 

Several questions remain open, including prov- 
ing optimal approximation bounds for MINTSS and 
MINTIME, as well as complexity results for these prob- 
lems under other propagation models. 
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A Proof of Lemma [2] 

Suppose there exists an algorithm A that selects (3k 
sets which covers 777 elements. Apply A to an arbitrary 
instance (U,S,i]) of PSC. The output is a collection 
of sets Ci such that \C\\ < (3k and | \J SeCl S\ > 777. 
Next, discard the sets that have been selected and the 
elements they cover, and apply again the algorithm A 
on the remaining universe. Repeat this process until 1 
or fewer elements are left uncovered 

Let rji denote the number of elements uncovered 
after iteration i. In iteration i, the algorithm picks 
/3k sets and covers at least 777^-1 elements. Hence, 
rji < r]i-i ■ (1 - 7). Expanding, 77, < 77 ■ (1 — 7)*. Sup- 
pose after I iterations, r]i = 1. The total number of sets 
picked is l/3k. 77 ■ (1 — 7)' = 1 implies I = . ? . 

We now prove the first claim. Let 7 > 1 — 1/e", 
then In (jhj) > P- This yields a PTIME algorithm 
for PSC which outputs a solution of size l(3k = (3k ■ 
In 77/ In < c ■ klnrj (for some c < 1) This yields an 
c ■ In ^-approximation for PSC for some c < 1, w hich is 
not p ossible unless NP C DTIME{n° {lo ^°^) (|Feige . 
19981) . 

To prove the second claim, assume (3 < (1 



6) In (jhf) ■ This gives a PTIME algorithm for PSC 



which outputs a solution of size l(3k 



/^•ln^/ln^ < 



(1 — 5)k ■ In 77 which is not possible unless NP C 
DTIME{n 0{ - 10 ^ ^). □ 



B Example Illustrating Performance of 
Wolsey's solution 



Wolsevl (|1982l) studied the RSSC problem and showed, 
among many things, that the greedy algorithm provides 
a solution that is within a factor of 1+^(77/(77— f{St-i)) 
of the optimal solution. Unfortunately, this does not 
yield an approximation algorithm with any guaranteed 
bounds. The following example shows the greedy solu- 
tion with threshold 77 can be arbitrarily worse than the 
optimum. 
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Fig. 4 Example. Rectangles represent the elements in the 
universe. The shaded area within a rectangle represents the 
coverage function / for the element, e.g., f(vi) = 1/2 + 1/2 = 
1. 



Example (Illustrated also in Figure H}. Consider a 
ground set X = {w\, w%, v\, V2, vi} with elements 
having unit costs. Figure E] geometrically depicts the 
definition of a function / : 2 X — >-R, where for any set 
S C X, f(S) is defined to be the area (shown shaded) 
covered by the elements of S. Specifically, f(wi) = 
f(w 2 ) = 1 - l/2 l+1 and f(v t ) = l/2*-\ 1 < 1 < I. 
Notice, /({«!,. ..,«,}) = SUl/2 1 - 1 = 2-1/2'- 1 < 2- 
1/2' = f({wi, 1112})- The greedy algorithm will first pick 
v\. Suppose it picks S = {v\, ...,i>i} in i rounds. Then 
f(S U {v i+1 }) - f(S) = 1/2* > 1 - 1/2 I + 1 - 1 + 1/2' = 
l-l/2 i + 1 -l/2(2-l/2 1 - 1 ) = f(SU{ Wl })-f(S). Thus, 
greedy will never pick wi or W2 before it picks V\, 
Suppose 77 = 2 — 1/2'. Clearly, the greedy solution is X 
whereas the optimal solution is {w\, W2} ■ Here I can be 
arbitrarily large. 



Instead of 1, we could be left with a constant number of 
elements. Asymptotically, it does not make a difference. 



